ESSnet on common tools and harmonised methodology for SDC in the ESS

Task 3. Future directions of SDC software tools

Objectives

In previous projects mayor steps forward have been made on the level of software development. The ARGUS software is now being used in several NSIs. However, it is essential to ensure the future of ARGUS. At this moment the development depends mainly on a few researchers and developers at Statistics Netherlands. This is a very small basis and very undesirable for software that is essential in the production process of several NSIs.
Another issue is the integration in the production process. Although ARGUS reads (and writes) very general file formats, there is a need for extension here. Also the issue of metadata problems is reported. Although ARGUS needs also non-standard meta data, it has to be investigated whether SDMX and/or DDI could be used. For the special case of a SAS production environment, both Destatis and Statistics Sweden have build prototypical software in SAS to better integrate τ-ARGUS into the production process, especially when having to deal with sets of linked tables. Although SAS is an important extension, this should not exclude other options.
As cell suppression techniques rely on optimisation solutions, the current version of τ-ARGUS needs the availability of a (commercial) LP-solver. Both XPress and CPlex, two top-brand solvers, can be used. Although these solvers are available at very reasonable prices, this hampers the widespread use of τ-ARGUS. In future versions of τ-ARGUS it must be considered whether the use of freeware solvers as an alternative should be incorporated in τ-ARGUS.
The current versions of ARGUS have been written for the Windows platform. Many NSIs use Windows PCs for their production, but there are also other situations. It must be investigated to see whether solutions are possible to make the SDC tools suitable for different platforms.
The ESSNet CORA (Common Reference Architecture) has defined a common information architecture for NSIs. Definition of a CORA interface for τ-ARGUS would offer the following advantages:
- Use of τ-ARGUS as a statistical service without the need of a wrapper;
- Definition of the place of the system within GSBPM;
- Interface to SDMX
It has to be investigated whether and how the future SDC-tools should be made compliant with this architecture. For this the Dutch CORA-partner has joined the project. We envisage to co-ordinate this work with that of ESSNet-CORE. In order to do a proper investigation of the work to be done, the current source codes will be shared among the project partners involved.
Although very difficult some estimates of the costs of migration the current tools towards a more sustainable situation must be made based on the description of the current structure of Argus.

Description of work

The aim of this workpackage is to describe the future directions of the development of SDC-tools, both for microdata as well as for tabular data. The following issues, in arbitrary order, have to be addressed:
  1. Required functionality
  2. User friendliness
  3. Integration into the production process
  4. Sustainability
  5. Documentation of current versions
  6. Maintenance/Governance model
  7. Platforms
  8. Meta data
  9. Data file formats
  10. LP-solvers

Sub-Tasks:

  1. Required functionality. The current versions of ARGUS and also sdcMicro and sdcTable offer solutions for certain SDC-problems. It has to be investigated which new directions can be expected and which new solutions must be offered. This does not have to be an exclusive list. The future versions of the SDC tools need to be flexible, such that the developers can easily react on new methods becoming available.
  2. User friendliness. It is a great difference whether tools are being developed to be used by the researchers themselves or whether it is the intention that tools are being developed for the use by the subject-matter statisticians. The goal is to investigate possible improvements which can be made to achieve a maximum on user friendliness.
  3. Sustainability. The current versions of ARGUS are being used by a range of NSIs. However the development is concentrated very much at a very small team at Statistics Netherlands; this of course apart from contributions by other partners, mostly facilitated by previous projects. This is not an ideal situation. It sometimes hampers a quick development. And it also prevents interested other researchers to contribute to the development. One way or the other we should move to a situation where the team of developers can be enlarged.
  4. Integration into the production process. Whereas Istat has chosen the creation of a taylor made Oracle database to cope with linked tables creation, both, Destatis and Statistics Sweden have made an attempt to integrate τ-ARGUS into a SAS environment. We will compare and document the two packages, studying options of joining forces in future. At the same time, integration into the SAS environment can be seen as an example also for the integration of tabular SDC into the production process in general and to define the issues. Studying this example can also help us learn about issues of joining forces in software development, when some ideas are shared by the partners, and others are not. This will be a very valid experience also with respect to 6 (Governance model for open software development).
  5. Documentation of current versions. Before we can consider the migration of the software tools to a new version a good documentation of the current versions is very much needed. This will be very useful to future cooperating partners.
  6. Maintenance/Governance model. If the software tools become flexible, we must prevent that chaos will arise. These tools are being used in the production environment of the NSIs. Therefore we must guarantee that the methods offered in the future SDC tools have a high quality, even if contributions from other researchers are included. This project must develop a structure for that.
  7. Platforms. The current version of ARGUS only runs on Windows PCs. This is a strict consequence of the choice of the development environment. sdcMicro and sdcTable, written in R, do not have that restriction. It must be investigated which platforms are being used by the NSIs and that should lead us to the selection of the future development tools.
  8. Meta data. Although meta data issues are an almost endless story and we certainly do not plan to solve that issue in general, we have encountered several complaints that the SDC tools do not meet the standards used by the NSIs. A small investigation is needed in order to have an overview of the most urgent problems. This will lead us to further enhancements.
  9. Data file formats. Also at the level of file formats there are complaints that the current SDC tools cannot easily be applied as cumbersome transformations are needed. It is difficult to oversee these problems, but it must guide us to the architecture of future SDC tools.
  10. LP-solvers. Especially tabular data protection heavily relies on optimisation based solutions. This is true for cell suppressions but also for perturbative methods. As the LP-problems can become very large and hence very complex to solve, till now we have used commercial solvers. It has always been a big hurdle to widely use τ-ARGUS. It has to be reconsidered whether or not non-commercial alternatives can be used. Maybe a full test is outside the scope of this project, but a first step must be made. The audit routine in τ-ARGUS is a possible test-object.